Applying Low-Overhead Rollback-Recovery to Wide Area Distributed Query Processing
نویسنده
چکیده
It is argued that there is a significant class of pipelined large grain data flow computations whose wide area distribution and long running nature suggest a need for fault-tolerance, but for which existing approaches appear either costly or incomplete. This paper presents an approach which exploits limited input from the application layer to implement a low overhead recovery protocol for such data flow computations. Over a large range of possible data flow graphs, the protocol supports tolerance of a single machine failure, per execution of the computation, and in many cases a greater degree of fault-tolerance. The protocol is implemented within an emulation of a distributed query processing system. Preliminary performance measurements suggest that the overhead is indeed low. keywords:data flow, fault-tolerance, measurement, query processing, rollback-recovery, wide area
منابع مشابه
A Rollback-Recovery Protocol for Wide Area Pipelined Data Flow Computations
It is argued that there is a significant class of pipelined large grain data flow computations whose wide area distribution and long running nature suggest a need for fault-tolerance, but for which existing approaches appear either costly or incomplete. An example, which motivated this paper, is the execution of queries over distributed databases. This paper presents an approach which exploits ...
متن کاملManetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit
Manetho is a new transparent rollback recovery protocol for long running distributed computations It uses a novel combination of antecedence graph maintenance unco ordinated checkpointing and sender based message logging Manetho simultaneously achieves the advantages of pessimistic message logging namely limited rollback and fast output commit and the advantage of optimistic message logging nam...
متن کاملCheckpointing and Rollback of Wide-area Distributed Applications using Mobile Agents
We consider the problem of designing rollback error recovery algorithms for dynamic, wide area distributed systems like the Internet. The characteristics and the scale of such a system complicate the design and performance of the algorithms. Traditional message passing based algorithms incur large overhead, in both the network traffic and message passing delay, in such a wide-area environment. ...
متن کاملCheckpointing and Recovery Algorithms Using Mobile Agents on a Hamiltonian Topology
Traditional message passing based checkpointing and rollback recovery algorithms perform well for closely coupled systems. In wide area distributed systems these algorithms may incur large overhead due to message passing delay and network traffic. So to design checkpointing and rollback recovery algorithms for wide area distributed systems, mobile agents are introduced. Network topology is assu...
متن کاملImplementation and Performance of Transparent Rollback-recovery in Manetho
We describe the implementation and performance of rollback-recovery in Manetho. During failure-free operation, Manetho maintains an antecedence graph which records the \happened before" relation between certain events in the distributed computation. The antecedence graph is used in combination with checkpointing and volatile sender-based message logging to simultaneously achieve low failure-fre...
متن کامل